Improving Credibility of Machine Learner Models in Software Engineering

نویسنده

  • Gary D. Boetticher
چکیده

Given a choice, software project managers frequently prefer traditional methods of making decisions rather than relying on empirical software engineering (empirical/machine learning-based models). One reason for this choice is the perceived lack of credibility associated with these models. To promote better empirical software engineering, a series of experiments are conducted on various NASA datasets to demonstrate the importance of assessing the ease/difficulty of a modeling situation. Each dataset is divided into three groups, a training set, and “nice/nasty” neighbor test sets. Using a nearest neighbor approach, “nice neighbors” align closest to same class training instances. “Nasty neighbors” align to the opposite class training instances. The “nice”, “nasty” experiments average 94% and 20% accuracy, respectively. Another set of experiments show how a ten-fold cross-validation is not sufficient in characterizing a dataset. Finally, a set of metric equations is proposed for improving the credibility assessment of empirical/machine learning models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CREDIBILITY-BASED FUZZY PROGRAMMING MODELS TO SOLVE THE BUDGET-CONSTRAINED FLEXIBLE FLOW LINE PROBLEM

This paper addresses a new version of the exible ow line prob- lem, i.e., the budget constrained one, in order to determine the required num- ber of processors at each station along with the selection of the most eco- nomical process routes for products. Since a number of parameters, such as due dates, the amount of available budgets and the cost of opting particular routes, are imprecise (fuzz...

متن کامل

When Will It Be Done? Machine Learner Answers to the 300-Billion-Dollar Question

W hen will it be done? " Senior managers will ask their software project managers this question more than 250,000 times this year. Corporations, which collectively commit over US$300 billion annually toward new software project initiatives, 1 will want to know the answer. However, when you consider Barry Boehm's claim that early software life-cycle estimates vary by a factor of four (25 to 400 ...

متن کامل

Software Quality Modeling with Limited Apriori Defect Data

In machine learning the problem of limited data for supervised learning is a challenging problem with practical applications. We address a similar problem in the context of software quality modeling. Knowledge-based software engineering includes the use of quantitative software quality estimation models. Such models are trained using apriori software quality knowledge in the form of software me...

متن کامل

Software Engineering and Simulation Credibility

Most people think of “validation” as the hallmark of simulation credibility. But some simulations, by their very nature (e.g., mission level models, highly complex physics-based simulations, etc.) are notoriously difficult to validate. There are also situations in which the process of validation, even if feasible, cannot keep pace with the dynamic nature of simulation evolution, or where the co...

متن کامل

Improving the Inference of Gene Expression Regulatory Networks with Data Aggregation Approach

Introduction: The major issue for the future of bioinformatics is the design of tools to determine the functions and all products of single-cell genes. This requires the integration of different biological disciplines as well as sophisticated mathematical and statistical tools. This study revealed that data mining techniques can be used to develop models for diagnosing high-risk or low-risk lif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006